Transformer bridge layer norm folding #1071

bryce13950 · 2025-09-27T17:37:15Z

Description

Please include a summary of the change and which issue is fixed. Please also include relevant motivation and context. List any dependencies that are required for this change.

Fixes # (issue)

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Screenshots

Please attach before and after screenshots of the change if applicable.

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

* created individual processing functions * extracted state dict and inserted back into instance after processing * created weight processing shared class * added test coverage for new functions * updated hooked transformer to use new shared functions * created test * moved over weight processing * replaced keys * used the correct function * created test for making sure path translation works correctly * fixed weight processing * added additional tests * formatted tests a bit * cleaned up * fixed unit test * fixed indentation * fixed doc string * fixed unit test * fixed type * fixed some tests * fixed test * fixed setup of tests

* created individual processing functions * extracted state dict and inserted back into instance after processing * created weight processing shared class * added test coverage for new functions * updated hooked transformer to use new shared functions * created test * moved over weight processing * replaced keys * used the correct function * created test for making sure path translation works correctly * fixed weight processing * added additional tests * formatted tests a bit * cleaned up * fixed unit test * fixed indentation * fixed doc string * fixed unit test * fixed type * fixed some tests * fixed test * fixed setup of tests * cleaned up test * started working through individual matches * added test coverage * tested function a bit * integrated weight conversion into weight proccessing * simplified functions * identified individual problem lines * identified divergences more clearly * brought back error lines

…for already initialized components (#1066)

* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files

* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files * created loop to verify weight conversion * finished compatibility layer * finished testing hugging face weights * setup correct init * added some tests * removed seperate component * fixed some integration tests

* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files * created loop to verify weight conversion * finished compatibility layer * finished testing hugging face weights * setup correct init * added some tests * removed seperate component * fixed some integration tests * fixed typing issue * fixed typing and format issues * fixed ci issues * ran format * fixed mypy issues * removed extra file * removed old scripts * tested format * fixed some tests * ran format * fixed tests * fixed acceptance tests * fixed some more tests * synced functionality completely * reduced old references * removed remaining references * moved forward functions * removed forward * tested various forwards * worked on getting original forwards back into place * added more coverage * cleaned up model * git status * Fix automatic weight extraction to use reference HookedTransformer This restores the working weight extraction mechanism that creates a reference HookedTransformer internally and extracts exact processed weights for perfect compatibility with ablation studies. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]> * moved embed stuff from bridge * moved MLP stuff * claned up a bit * cleaned up a bit * removed extra block * created pos embed bridge * fixed unembed --------- Co-authored-by: Claude <[email protected]>

* moved final layer norm * moved layer norm forward * cleaned up more things * updated attention weight loading * fixed function names

* fixed some ci issues * fixed type issues * ran format * fixed test * fixed type issues * fixed type issue * fixed type issue * fixed test * fixed test * fixed issues * ran format * fixed typing * fixed tests * fixed tests * simplified test * sped up tests * added check for kv cache * ran format * skipped some tests * marked a couple tests to skip * ran some more optimizations * ran poetry lock * regenerated lock * fixed commands * set random seed * updated parallelism prop * updated command * reverted some changes * updated notebook settings * updated verbosity * removed extra test * cleaned up tests some more * marked test as skipped * fixed more tests * sped up CI * reverted CI changes * reverted actions changes * improved cache * sped up some tests * optimzed more tests * sped up some more tests * made more speed improvements * fixed error * fixed typing

* cleaned up some debug points * fixed attention hooks * enabled hooks in test

* split out some tasks into their own jobs * removed bad file * updated name

* fixed batch dimension * removed log point * fixed potential error * sped up load * ran format * improved hf cache handling * fixed bridge * fixed cache again * added more checks * removed parallel execution

* fixed cache hooks * fixed test and typing * fixed test

* fixed bias displaying * fixed ablation issue * fixed type issue

* setup new hooks properly * fixed type checks

* fixed alias hook props * ran format

* made all hooks show properly * ran format * fixed type checks

* updated loading in main demo to use transformers bridge * updated model name * updated imports * updated some cells * reran demo * updated some cells * reran some cells * reran demo * ran demo again * finished generating new cells

* Update README.md (#957) Update link to Streamlit tutorial and guide. Co-authored-by: Bryce Meyer <[email protected]> * improve model properties table in docs (#769) * add static to gitignore * making a meaningless change to see if tests pass at all * making a meaningless change to see if tests pass at all * add interactive table static html only adding things one at a time to see what causes things to break * run poetry update with no changes to deps * revert lockfile change * add tiktoken >=0.7.0 to group docs * add dep muutils >=0.6.15 to group docs * add improved interactive table generation we still generate a plain markdown table code is from the old PR: https://github.com/mivanit/TransformerLens/blob/add-better-model-properties-table/docs/make_docs.py which is in turn a modified version of https://github.com/mivanit/transformerlens-model-table * fix format -- missing trailing newline * fix type hints for compatibility * fix torch device meta in make docs script, also improved hot reload * TEMPORARY: allow_except when getting models to deal with mixtral HF_TOKEN issue * added simple test for get_model_info * context manager for controlling device, tests were breaking due to default device meta * formatted with wrong version of black, oops * fix path to generated model_properties_table * fix md table header, add title in yaml frontmatter * add line to frontmatter yaml, re-run tests bc huggingface down? * do not allow exceptions when getting models * re-run poetry lock * attempt fix lockfile * re-run poetry lock --------- Co-authored-by: Bryce Meyer <[email protected]> * switch pyproject from toml to uv, generate lockfile also update tiktoken dep for 3.13 compatibility * update makefile to use uv * update actions * hack to get version to work * wip * make dep * update contributing.md to reflect switch from poetry to uv * add type hints to supported_models * fix paths in make_docs.py * docs group not in default, update install instructions for docs * POETRY_PYPI_TOKEN_PYPI -> PYPI_TOKEN_PYPI * make format * fix default groups, re-add docs * add some deps needed in notebooks * removed use of torchtyping in othello_GPT.ipynb and deps - torchtyping causes various issues if it's imported - presumably jaxtyping should be used instead?? - othello GPT notebook doesn't actually use the imported TT - shouldn't a linter/formatter catch this sort of unused import? * fix: add pythonpath "." to pytest config for test imports Configure pytest to include project root in Python path, enabling `from tests.foo import bar` style imports, which were broken by switching to uv * attempt jupyter issue fix * issue ref explaining ipython version restriction * updated ci commands after recent work * fixed more setup items * added tabulate dependency * updated make docs command * updated dependencies * fixed docs --------- Co-authored-by: jmole <[email protected]> Co-authored-by: Bryce Meyer <[email protected]>

* setup tests for hooks * ran format * merged legacy hooks tests * ran format * enabled compatibility mode * added remaining hooks * fixed type issue * added main demo cached output * removed debug items * reran notebook * marked cell for skipping * reran notebook * regenerated demo * regenerated notebook

* updated loading in arena content demo to use transformer bridge * updated install reference * removed extra params * ran some cells * updated arena notebook --------- Co-authored-by: Bryce Meyer <[email protected]>

* regeneerated with new hooks * ran first cell

* added test coverage for ensuring compatibility * ran format * fixed unit tests * resolved type issue * added init files * added init file * fixed tokaize function * fixed attention mask issues * reverted invalid change to test

This reverts commit 674f3a4.

* created test that asserts hook shapes for various models * created initial doc for explaining transformer bridge model structure * ran format * cleaned up docs and enabled test * ran format * imporeved hook test * ran format * did some memory cleanup * added more hook shape coverAGE * made more optimizations * fixed type checking

* cleaned up comments * removed additional comments * fixed gradian scale * ran format * resovled merge issue * adkded test for all forward passes * fixed hook z shape * matched attention hook * updated function location * created backward test * worked through some backwards hooks * resolved backwards issue * fixed type issues * fixed docstring test * fixed docstring * fixed tests * improved hook compatibility * split weights before for joint qkv * updated the way split qkv works * aliased hook properly * lowered test tolerance * updated tolerance * fixed type checks * improved gradients * revisted test * fixed type checks * added hook to block * updated gemma 3 * fixed gemma 3 * fixed format * excluded remaining hooks * regenerated arena demo * ran format * reran demo * fixed shape issue * regenerated main demo * added ignore block * reran cell * fixed block * fixed ci bugs * fixed test * updated tolerance * revised rates * backed off on tolerance

* added test coverage for ensuring compatibility * ran format * fixed unit tests * resolved type issue * added init files * added init file * fixed tokaize function * fixed attention mask issues * fixed utils * fixed dtype

* tested llama 3.1 * removed extra file * fixed eps access * brought rotary_emb back * added param * connected correct bridge

* fixed hook duplication * moved tests * fixed new tests * ran format * regenerated demo * tightened tolerance a bit * Revert "regenerated demo" This reverts commit 50adba0. * fixed cache location bug * reverted test * removed comment * cleaned up fix * cleaned up fix * cleaned up hook_out * restored shape checking * cleaned up solution moire * regenerated demo * reran cell * regenerated main demo * regenerated demo

* tested llama 3.1 * removed extra file * fixed eps access * brought rotary_emb back * added param * ran initial gemma 2 fix * connected correct bridge * changed to hook cos and sin * passed test * updated embedding implementationg * reverted embedding changes * updated return * updated param order * skipped test in CI * skipped test properly

* resolved experts mapping issues * ran format * Fix GPT-OSS JointGateUpMLPBridge hook alias resolution and add tests This commit addresses hook alias resolution issues for GPT-OSS MoE models and adds comprehensive unit tests. Changes: 1. Fixed JointGateUpMLPBridge hook_aliases to use gate.hook_out instead of in.hook_out/input.hook_out, which don't exist in this bridge type 2. Added 7 comprehensive unit tests in test_gpt_oss_moe.py that verify: - Model loads without downloading weights (using meta device) - Bridge creation works correctly - MLP uses JointGateUpMLPBridge (not regular MLPBridge) - Compatibility mode hooks are accessible - Experts structure is correct (batched tensors, not iterable modules) - Hook aliases resolve correctly - No incorrect BlockBridge wrapper around experts Root cause: - JointGateUpMLPBridge inherits from MLPBridge which has hook_aliases expecting in.hook_out or input.hook_out submodules - JointGateUpMLPBridge creates gate and up submodules instead, causing AttributeError when resolving aliases - Solution: Override hook_aliases at class level to use gate.hook_out Testing: All 7 tests pass, verifying GPT-OSS loads correctly and hooks work in compatibility mode without downloading the full 20B parameter model. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * added model name to avilable models * added eps config * updated to rotary bridge * passed through to parent * fixed tuple passing * changed oss bridge * fixed gpt oss activations * ran format * removed colab compat from checks --------- Co-authored-by: Claude <[email protected]>

* created benchmark suite * fixed type issues * fixed type issues

* finalized t5 adapter * tested t5 architecture * fixed type issues * resolved experts mapping issues * Revert "resolved experts mapping issues" This reverts commit 9fa5125. * ran format

* improved various models * improved llama * fixed benchmark utils now * fixed test

* resolved experts mapping issues * ran format * Fix GPT-OSS JointGateUpMLPBridge hook alias resolution and add tests This commit addresses hook alias resolution issues for GPT-OSS MoE models and adds comprehensive unit tests. Changes: 1. Fixed JointGateUpMLPBridge hook_aliases to use gate.hook_out instead of in.hook_out/input.hook_out, which don't exist in this bridge type 2. Added 7 comprehensive unit tests in test_gpt_oss_moe.py that verify: - Model loads without downloading weights (using meta device) - Bridge creation works correctly - MLP uses JointGateUpMLPBridge (not regular MLPBridge) - Compatibility mode hooks are accessible - Experts structure is correct (batched tensors, not iterable modules) - Hook aliases resolve correctly - No incorrect BlockBridge wrapper around experts Root cause: - JointGateUpMLPBridge inherits from MLPBridge which has hook_aliases expecting in.hook_out or input.hook_out submodules - JointGateUpMLPBridge creates gate and up submodules instead, causing AttributeError when resolving aliases - Solution: Override hook_aliases at class level to use gate.hook_out Testing: All 7 tests pass, verifying GPT-OSS loads correctly and hooks work in compatibility mode without downloading the full 20B parameter model. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * added model name to avilable models * added eps config * updated to rotary bridge * passed through to parent * fixed tuple passing * changed oss bridge * fixed gpt oss activations * ran format * removed colab compat from checks * decoupling weight processing completely from hooked transformer * fixed weight processing issues * fixed test * fixed tests * fixed forward tests * updated test * fixed last test * fixed format * ran format * added whitespace --------- Co-authored-by: Claude <[email protected]>

…er (#1103)" (#1108) This reverts commit 931b45f.

* finalized bench mark logic * ran format

* improved various models * improved llama * fixed benchmark utils now * fixed test * fixed opt adapter * fixed phi-3 * fixed format * fixed type issues * added line break * fixed phi-3

bryce13950 and others added 8 commits September 21, 2025 17:57

Add missing configuration parameters

6a00384

Add missing configuration parameters

7e691b3

Properly set up normalization_type and layer_norm_folding attributes …

8b434d7

…for already initialized components (#1066)

Process accuracy (#1067)

c0255e8

* imporoved accuracy a bit * got models to match * removed forward pass stuff * cleaned up weight processing a bit * removed working attention * restored files

bryce13950 changed the title ~~Dev 3.x folding~~ Transformer bridge layer norm folding Sep 27, 2025

bryce13950 and others added 21 commits September 29, 2025 22:33

Revision extra forwards (#1073)

ddcb4f5

* moved final layer norm * moved layer norm forward * cleaned up more things * updated attention weight loading * fixed function names

Merge branch 'dev-3.x' into dev-3.x-folding

fece8c9

Attention hooks full coverage for folding (#1078)

caf4c3d

* cleaned up some debug points * fixed attention hooks * enabled hooks in test

Ci job splitting (#1079)

ed23558

* split out some tasks into their own jobs * removed bad file * updated name

fixed batch dimension (#1082)

ebea4f4

* fixed batch dimension * removed log point * fixed potential error * sped up load * ran format * improved hf cache handling * fixed bridge * fixed cache again * added more checks * removed parallel execution

fixed cache hooks (#1083)

445f747

* fixed cache hooks * fixed test and typing * fixed test

fixed bias displaying (#1084)

f60e8d3

* fixed bias displaying * fixed ablation issue * fixed type issue

fixed return type none (#1085)

251b5ab

Create pass through for hooks in compatibility mode (#1086)

f053201

* setup new hooks properly * fixed type checks

fixed alias hook props (#1087)

817be64

* fixed alias hook props * ran format

made all hooks show properly (#1088)

b6477a0

* made all hooks show properly * ran format * fixed type checks

addded full kv cache (#1089)

92585df

fixed first two tests

674f3a4

regeneerated with new hooks (#1091)

8b259a2

* regeneerated with new hooks * ran first cell

added test coverage for ensuring compatibility (#989)

d6934cb

* added test coverage for ensuring compatibility * ran format * fixed unit tests * resolved type issue * added init files * added init file * fixed tokaize function * fixed attention mask issues * reverted invalid change to test

bryce13950 and others added 12 commits October 17, 2025 11:23

Revert "fixed first two tests"

3e64a48

This reverts commit 674f3a4.

Final compatibility coverage (#1090)

01db4e1

* added test coverage for ensuring compatibility * ran format * fixed unit tests * resolved type issue * added init files * added init file * fixed tokaize function * fixed attention mask issues * fixed utils * fixed dtype

tested llama 3.1 (#1096)

c6bed77

* tested llama 3.1 * removed extra file * fixed eps access * brought rotary_emb back * added param * connected correct bridge

fixed stop at layer (#1100)

0a677fb

created benchmark suite (#1104)

0165122

* created benchmark suite * fixed type issues * fixed type issues

finalized t5 adapter (#1095)

4239a26

* finalized t5 adapter * tested t5 architecture * fixed type issues * resolved experts mapping issues * Revert "resolved experts mapping issues" This reverts commit 9fa5125. * ran format

Model improvements (#1105)

7aca605

* improved various models * improved llama * fixed benchmark utils now * fixed test

bryce13950 force-pushed the dev-3.x-folding branch from 397b438 to 7aca605 Compare November 6, 2025 01:55

bryce13950 and others added 5 commits November 6, 2025 04:01

removed invalid comparison (#1107)

7abff67

Revert "decoupling weight processing completely from hooked transform…

905fe6d

…er (#1103)" (#1108) This reverts commit 931b45f.

finalized bench mark logic (#1109)

eb63a26

* finalized bench mark logic * ran format

Fix opt (#1106)

ee9dbf1

* improved various models * improved llama * fixed benchmark utils now * fixed test * fixed opt adapter * fixed phi-3 * fixed format * fixed type issues * added line break * fixed phi-3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transformer bridge layer norm folding #1071

Transformer bridge layer norm folding #1071

bryce13950 commented Sep 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Transformer bridge layer norm folding #1071

Are you sure you want to change the base?

Transformer bridge layer norm folding #1071

Conversation

bryce13950 commented Sep 27, 2025

Description

Type of change

Screenshots

Checklist:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants